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Abstract 

A hierarchical multigrid algorithm for efficient 
steady solutions to the two-dimensional compress- 
ible Navier-Stokes equations is developed and 
demonstrated. The algorithm applies multigrid in 
two ways: a Full Approximation Scheme (FAS) for a 
nonlinear residual equation and a Correction Scheme 
(CS) for a linearized defect correction implicit equa- 
tion. Multigrid analyses which include the effect of 
boundary conditions in one direction are used to es- 
timate the convergence rate of the algorithm for a 
model convection equation. Three alternating-line- 
implicit algorithms are compared in terms of effi- 
ciency. The analyses indicate that full multigrid 
efficiency is not attained in the general case; the 
number of cycles to attain convergence is dependent 
on the mesh density for high-frequency cross-stream 
variations. However, the dependence is reasonably 
small and fast convergence is eventually attained 
for any given frequency with either the FAS or the 
CS scheme alone. The paper summarizes numeri- 
cal computations for which convergence has been at- 
tained to within truncation error in a few multigrid 
cycles for both inviscid and viscous flow simulations 
on highly stretched meshes. 
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Introduction 

There has been an explosive growth in the use of 
computational fluid dynamics methods in the air- 
craft design cycle over the past twenty-five years. 
Recently there has been an emphasis on three- 
dimensional Navier-Stokes simulations over com- 
plex configurations; computations with 10-20 mil- 
lion grid points are commonplace in focused appli- 
cations. Even with the advent of more powerful com- 
puters, algorithms that attain optimal convergence 
rates are important to enabling these computations 
to be accomplished in a reasonable wall-clock time. 
An optimal method is one in which the arithmetic 
operations to attain a solution to within truncation 
error scale as O(N), where N is the number of equa- 
tions to be solved. Here, N is the number of finite 
volumes in the solution ( Npy ) times the number of 
conservation equations for each finite volume (ra); 
i.e. , N = mNpy. Generally, the total operation 
count can be expressed as c N Wmwu where Wmwu 
is the operation count corresponding to one mini- 
mal work unit (MWU), i.e., the simplest possible 
discretization of the equations to the order desired, 
and c is a constant that differentiates one optimally 
converging method from another. 1 One method of 
attaining optimal convergence rates is the multigrid 
method. For elliptic equations, textbook efficiencies, 
which attain convergence in four to five residual eval- 
uations, are possible. 1,2 For hyperbolic equations, 
O(N) methods have been developed for the incom- 
pressible Euler equations 1,3,4 and for compressible 
Euler equations using either the Full Approxima- 
tion Scheme (FAS) 5,6 or the defect correction (DC) 
scheme. 7-10 Multigrid solvers for viscous flows have 
also been developed using these approaches. 11-14 

For the compressible Navier-Stokes equations, 
textbook efficiencies have not been attained for gen- 
eral situations; the barriers which need to be over- 
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come are addressed by Brandt . 2 One of the prin- 
cipal difficulties for complex-geometry applications 
has been the need for very highly stretched grids to 
resolve viscous flows near the body, with correspond- 
ingly bad (and unintended) aspect ratios in other re- 
gions. As an example, for a transonic wing with sep- 
arated flow, the convergence rate for a widely used 
multigrid method based on the FAS algorithm and 
a multistage Runge-Kutta scheme 13 requires on the 
order of 1500 residual evaluations to attain conver- 
gence of the lift to within one percent of the asymp- 
totic value . 15 Recently, improvements that use line- 
implicit algorithms and semi-coarsening approaches 
have been demonstrated for these applications . 16-1 9 

The purpose of this paper is to introduce an al- 
gorithm that uses full-coarsening multigrid to accel- 
erate convergence for viscous applications. All of 
the basic elements of the method (multigrid, line 
relaxation, upwind differencing, defect correction, 
etc.) are well known. The algorithm applies the 
FAS scheme to the nonlinear residual equations and 
the correction scheme (CS) multigrid to the lin- 
earized implicit equations. The methodology uses 
alternating-line-implicit methods, although the es- 
sential feature is the requirement for the solution to 
a local block Jacobian matrix of size m at each grid 
point. Thus the operation counts are 0{Npym 3 ) 
and the computational work will scale as 0(Nm 2 ). 
This is in contrast to the class of O(N) algorithms 
discussed by Brandt 2 which decouple the equations 
into separate scalar contributions, each of which is 
treated optimally. The computations are supported 
by analysis of convergence for a simple model con- 
vection problem which shows many of the essential 
features of the resulting algorithm. Comparisons 
are made with the baseline solver implemented in 
a widely used production code CFL3D . 14 Because 
the algorithm contains many elements of existing 
methods, the methodology should be able to be in- 
corporated into production codes and to accelerate 
convergence for realistic applications. 

Baseline Method 

The steady-state results are obtained with a finite- 
volume approach based on an upwind-biased treat- 
ment of the convective and pressure (Euler) terms 
and central differencing for the viscous terms. The 
method has been implemented in the CFL3D code, 
used widely for large-scale computations and de- 
scribed by Krist, et al . 14 Only a few basic features 
are cited here. The Riemann interface solver is the 
flux-difference-splitting method and the K-scheme of 
Van Leer 20 is used for state- variable extrapolations. 
Convergence to steady state is accelerated through 


the Full MultiGrid (FMG) process, i.e. , mesh se- 
quencing and FAS multigrid with an approximately 
factored implicit method as the relaxation scheme. 
In the approximate factorization (AF) method, the 
full matrix is replaced with a sequence of simpler 
operators, each of which is a block tridiagonal or 
pentadiagonal operator. However, the baseline AF 
method is used almost exclusively in its diagonal 
form . 21 The multigrid method was demonstrated to 
yield grid-independent convergence rates for Euler 
simulations using the flux-vector splitting method 
by Anderson, et al . 5 The scheme has been used 
routinely to accelerate the solution for viscous flow 
using flux-vector splitting methods for the dissipa- 
tion, with a spectral radius approximation to the 
viscous Jacobians. The scheme is generally applied 
as a W(1,0 ) 5 FAS cycle using a Courant number of 
5. For time-dependent simulations, because of the 
severe time-step limitation of the method, subiter- 
ations have been used to improve the accuracy and 
stability of the implicit scheme, as in Rumsey, et 
al . 22 

Multigrid Method 

The present algorithm uses multigrid in two ways. 
The first way is through an outer FAS multigrid 
cycle to solve the second-order-accurate, nonlinear 
steady-state residual operator. The second way is 
through an inner iteration to solve the first-order- 
accurate, linearized implicit operator. We describe 
the multigrid methods for the two approaches be- 
low by means of a two-grid approach, in which the 
fine grid is denoted by superscript h and the coarse 
grid by superscript 2 h. The coarser grid equations 
are themselves solved with 7 cycles of the algorithm 
applied recursively, where 7 = 1 corresponds to a 
V-cycle and 7 = 2 to a W-cycle. 

FAS Multigrid Cycle 

The second-order-accurate steady-state residual 
operator to be solved on the finest grid is defined 
as 

TL h (Q h ) = 0 (1) 

where this equation represents the inviscid convec- 
tive and pressure terms and the viscous diffusion and 
heat transfer terms; Q h represents a vector of size 
m at each of the Npy hnite- volumes in the domain. 
After relaxation(s) of the fine-grid operator to ob- 
tain Q h , the coarse-grid equation at level 2 h to be 
solved for a correction to the fine grid is 

R 2/l (Q 2/l ) = R 2h (TZQ h ) - TZR h (Cl h ) (2) 
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where 1Z denotes a restriction operator for transfer 
of information to the coarser grid and the ' super- 
script denotes a most recently available value. The 
correction from the coarser grids is prolonged to the 
finer grids as 


Q h <- Q h + V(Q 2h - TZQ h ) (3) 

where V denotes a prolongation operator. The FAS 
cycle described above is used extensively for cur- 
rent Euler and Navier-Stokes solvers. The differ- 
ences in convergence between solvers lie chiefly in 
their choice of relaxation (smoothing) scheme, such 
as the approximate-factorization method 5 or multi- 
stage Runge-Kutta methods with implicit residual 
smoothing. 6, 13 

From the standpoint of a Newton method, the 
fine-grid correction can be written 

r)"R ^ 

[—](AQ h ) = -R h (Q h ) (4) 

where the solution is updated as Q h Q h + AQ h . 
The implicit equation is a large-banded matrix equa- 
tion which is usually approximated for solution with 
two approaches. The first approach is to use an ap- 
proximate linearization of the residual; commonly, 
the linearization of the residual uses first-order dis- 
cretizations for the convective and pressure contri- 
butions. For this approach, we can write the implicit 
scheme as 


Q-R h 

[^-±](AQ h ) = -R l t(Q h ) ( 5 ) 

where the subscripts t and d denote some desired 
“target” and “driver” schemes on the right and left 
sides, respectively, of the equation. Note that this 
equation is the defect correction form of the equa- 
tions 7,8,10 written in delta form for the update. This 
“defect” in the implicit approximation leads to some 
interesting consequences for the algorithm, as dis- 
cussed subsequently, even if we make no further ap- 
proximations. 

The second approach is to solve the full matrix 
equation iteratively or with a noniterative approxi- 
mate factorization. For instance, Anderson, et al. 11 
used red-black block-matrix subiterations (usually 
15) with point (m-block matrix inversions) relax- 
ations to approximate the solution of the fine-grid 
implicit equation for unstructured grids. Thus the 
efficacy of the solver becomes a trade-off between the 


additional work of the subiterations and the reduced 
approximations to the implicit equation. 

If the implicit terms are differenced with first- 
order-accurate upwind discretizations, the resulting 
equations are block diagonally dominant. There- 
fore, these equations can be solved efficiently with 
multigrid methods and standard relaxation meth- 
ods used for solution of iterative equations, such as 
Jacobi and Gauss-Seidel relaxation. With a second- 
order-accurate discretization, the implicit equations 
are only block diagonally dominant for a CFL num- 
ber of unity. In either case, because the implicit 
equations are linear, a CS multigrid method can be 
used, as described below. 

CS Multigrid Cycle 

During the iterative process to solve the linear 
implicit equation above, the second-order accurate 
residual is held fixed, defined as h h = — R* (Q ft ). 
The equations are first relaxed on the fine grid for 
an approximate solution using a subiteration counter 
/, (/ = 0, 1, . . . , N s - 1), as 


[^] [(AQ h ) l+1 - (AQ h ) 1 ] 

fljy h 

=b "- [ w i]{A(ih)l (6) 

or equivalently, 


{ dn h d 

[ dQ h 


](AQ a ) ,+1 


= h h 


dR h d 

dQ h 


dR h d 

dQ h 


](AQ a )' 


( 7 ) 


where the approximation to the implicit equation 
on the left side is denoted with an overline notation 
and (AQ ft )° = 0. The coarse grid also supplies a 
correction to the linear system through solution of 
the coarse grid correction equation, defined below. 


f)p2 h QD h 

3Q^[AAQ 2A ] = K[ b h - [-^](AQ a )] (8) 

where the latest value of (AQ ft ) is used on the right. 
The correction to AQ 2/l from the coarser grid is pro- 
longed to the finer grids as 


AQ ft f- AQ ft + V [A AQ 2/1 ] (9) 

All of the boundary conditions are completely lin- 
earized and incorporated into the defect correction 
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operator. In the CS multigrid, the coarser grid 
implicit matrices are found by restricting the cor- 
responding finer grid implicit matrix contributions 
with the result that the above linearization need only 
be done for the fine grid. The linearization includes 
a time term which can be ramped from small val- 
ues at impulsive starts from freestream conditions to 
Courant numbers on the order of 300-500. The lin- 
ear system is easier to solve if the time step is small, 
and there is little to be gained in the second-order 
residual convergence for Courant numbers beyond 
these values for most flows. 

Convergence of Defect Correction 

The DC method can be written in terms of target 
and driver operators L t and L d , respectively, on a 
given grid h as 


enough that the first-order scheme satisfies the or- 
der property above, for which convergence would be 
expected in just a few iterations. The second regime 
corresponds to a mesh for which the second-order 
scheme satisfies the order property but the first- 
order scheme does not. It is this regime for which the 
slowest convergence would be expected. The third 
regime corresponds to a mesh coarse enough that 
neither the first-order nor second-order scheme sat- 
isfies the order property above; in this regime, con- 
vergence to truncation error is nonetheless generally 
rapid because the truncation errors are 0(1). 

These regimes can be classified sharply for the k 
family if we consider convection, Lu = u x + tu y , 

Td{u) = ~^[u xx +tuyy\ + 0(h 2 ) (16) 


L d (u n+1 ) = L d (u n ) -L t (u n ) (10) 

where the operators are designed to approximate the 
actual partial differential operator to within an order 
property 


Ld — L + Td — L + 0(h) (11) 


Lt — L + Tt — L + 0(h 2 ) (12) 

Now substituting from the above into Eq. (10), defin- 
ing u = Uexact + e where T(u e xact) = 0 and e is the 
truncation error, then 


(L + T d )(e n+1 ) = (L + T d )(e n ) - L(e n ) + 0(h 2 ) 

= T d (e n ) + 0(h 2 ) (13) 


If’ 


assume that (L + Td) 1 = L 1 + O(h ) , then 


T t (u) = h ^ 4 [u xxx + tUyyy ] + O (Z^) (17) 

Considering an exact solution with cross-stream fre- 
quency u>, corresponding to u = exp (iu>(y — tx)), the 
scheme attains its design accuracy when 

N = ly w2 (l + l)l < \ ( 18 ) 


|r t | = | ft2 * (K ~ 1/ 3 ) u,3ft2_i)|< * 


(19) 


The inequalities on the right denote approximate 
values observed in parametric calculations; for val- 
ues satisfying the inequality, the scheme attains its 
design accuracy — an asymptotic error reduction of 
2 P as the mesh is refined by a factor of two. Thus 
the three regions can be classified as below: 


e n+1 = L~ 1 T d (e n ) + 0(h 2 ) (14) 

Telescoping the error terms from an arbitrary start- 
ing error e°, then 

L(e n ) = L(L- 1 T d ) n (e 0 )+O(h 2 ) (15) 

Because we need only converge the solution until 
truncation error, L(e n ) = 0(h 2 ), we can catego- 
rize the convergence of defect correction into three 
regimes. The first regime corresponds to a mesh fine 


I : M < \ 

II : \r d \ > \ and |r t | < \ 

HI : \n\ > i 

The three regions denote the disparity between the 
dissipation of the driver and target schemes and 
are a restatement of the “survival distances” asso- 
ciated with convection schemes derived by Brandt 
and Yavneh 4 in studies of the incompressible Navier- 
Stokes equations, for standard schemes of the type 
considered here as well as for hybrid schemes of im- 
proved accuracy and convergence. 
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Analysis of Convergence 


Model Problem 

To analyze convergence, we consider a model con- 
vection equation corresponding to flow at some angle 
of attack a to a unit square, 


s x + t s y — 0, x G [0, 1] , y E [0, 1] (20) 

where t = tan a. We consider a finite- volume 

scheme for Eq. (20) on a uniform Cartesian grid with 
spacings h x = 1/N X and h y = 1/N y in the x- and 
^-directions, respectively, as 


8 x s h +tS y s h = 0 ( 21 ) 

where Sj k is defined at the locations xj = (j — l/2)h 
and y k = (k - 1/2 )h for (j, k) = 1,2,..., (N x ,N y ). 
Assuming periodicity of the solution in y, Sj k = 
Uj exp (iui y yk) where ui y is a given frequency in 
the ^-direction. The exact solution to Eq. (20) is 
s(x,y) = /(£) where £ = y — t x, corresponding in 
this case to [«j] e xact = exp (— it ui y Xj). The set of 
discrete frequencies ui y realizable on a given grid can 
be defined in terms of 9 y = ui y h y as 


' y xo) + ci exp (— i t uiy X-i) 
l exp (— i t uj y xo) 

0 

0 

(25) 

Sy = — [ci exp (~i28 y ) + 

fly 

c 2 exp (-iOy ) + c 3 + c 4 exp (i0 y )\I (26) 

where S y is a diagonal matrix and is lower trian- 
gular for the fully upwind schemes, k C {— 3,— 1}. 
The coefhcients for kE [— 1, 1] are 

{ci, c 2 , c 3 , c 4 } = 1{1 - k, 3k - 5, 3(1 — k), 1 + k } 
and for k = — 3 are 

{ci, c 2 , c 3 , c 4 } = {0, -1, 1, 0} 
and 2 i_ 3 reflect the incorporation of nonreflecting 
boundary conditions at the right for the upwind- 
biased schemes, c E (-1,1]. 

Defect Correction 

The defect correction scheme to solve this equa- 
tion can be written in delta form, Am = u n+1 — u n , 
where the superscript n denotes a cycle (iteration) 
counter, as 


/ = 


c 2 exp (— i t Ll 

C] 


27 tI 


/ = 0 , 1 , 


,Ny~l 


( 22 ) 


L h d (\u h ) = -L h t (u h ) -f t h n = 0,1, 


,N C — 1 
(27) 


The numerical scheme can be written in matrix 
notation as 


[S h x + tS h y )u h = L h u h = -f h (23) 

where u h is the vector of unknowns at the interior 
points and f h is the vector associated with boundary 
conditions imposed at the inflow from the exact so- 
lution. We consider either schemes of first-order ac- 
curacy or one of the family of k schemes, /t£ [-1,1]. 
As a shorthand notation, we refer to the first-order 
scheme as k = — 3. The matrices are 



’ c 3 

c 4 

0 

0 

0 

0 


C 2 

C 3 

c 4 

0 

0 

0 

1 

Cl 

C 2 

C 3 

c 4 

0 

0 

hx 

0 





0 


0 

0 

Cl 

C 2 

c 3 

c 4 


0 

0 

0 

Cl 

C 2 

C 3 


Defining the algebraic error e h as the difference be- 
tween the exact discrete solution and the current 
approximation, u h = M/ xact + e h , then Eq. (27) can 
be written 

L h d ( Ae h ) = -L h t {e h f 

for n = 0,1,... ,N C - 1 (28) 

Subiteration Scheme 

An exact solution of this equation at each cycle 
would correspond to an unfactored solver and is eas- 
ily accomplished here with any of the fully-upwind 
target schemes. However, as a model for the process 
when we solve the coupled system of Euler/Navier- 
Stokes equations, we consider an approximate solu- 
tion of the driver operator, including the effect of 
subiterations. 

L5[(Ae A )'+ 1 -(Ae A )'] = ~L h t [e h f 

~ L h d (A e h ) 1 

for 1= 0,1,... ,N S - 1 (29) 
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where N s = 1 corresponds to a noniterative approx- 
imation, Eq. (28) is solved if the subiteration pro- 
cess converges, and (Ae ft )° is taken as zero at the 
start of the process. In addition to the unfactored 
scheme, we consider three approximations to L 1 ^, two 
of which are an Alternating Line- Jacobi (ALJ) and 
a spatially-factored Approximate Factorization (AF) 
scheme (also see Appendices I-II) 


L h \ A LJ 




(±- + t(S h y ) D + (S h x ) D ) 

-1 


(±+tS h y + (S h x ) D ) 

(30) 

L*\af = 

^ +sh ^r^ +t ^ 

(31) 


where superscript D denotes the diagonal contribu- 
tion from the matrix arising from discretization in 
x and y; for the differencing in y, the diagonal term 
is the exp (0) contribution to the difference symbol, 
Eq. (26). The third approximation to L h d is the Al- 
ternating Line Red-Black (ALRB) scheme (see Ap- 
pendix III), which can be represented as 

V\alrb = (N- 1 + N; 1 - N-^N; 1 )- 1 (32) 


( e i ) n exp (iui y y) 
(4) n exp(i(u,y + fi)y) 


(35) 


where 


' (e?)" ' 

( „h \n 

= G^ 

[ (ef)° 1 

( /D h\ 0 

L ( e 2 ) J 


L ( e 2 ) J 


For single-grid relaxations, the matrix G is a diag- 
onal matrix; off-diagonal entries arise for red-black 
relaxations and for intergrid transfers of informa- 
tion. The error amplification matrix per cycle for 
a multigrid cycle with v\ relaxations before restric- 
tion, v 2 relaxations after prolongation, and an exact 
solution on the coarser mesh 2 h can be written as 


G = [I -V {L 2 t h )~ 1 TZL'l] S [ ^ l] (37) 


where 


L 


h 

t 


L t{°y) 0 

0 L t A (0 y +7T) 


(38) 


TZ = [ R cos(9 y /2) Rsm(6 y /2) ] (39) 


where N y and Nj, represent sweeps through the 
mesh in which the y and x lines are solved implicitly 
in a red-black fashion, respectively. The equations 
use an artificial time term defined in terms of the 
CFL number as 

ii = ( i + ‘i }/CFL <»> 

where the time term serves principally as a relax- 
ation device for steady-state solutions. For all of the 
analyses here, we use CFL = oo for the ALJ and 
the ALRB schemes. 

Error Amplification Matrices 

It is usual to construct the amplification of the 
error for the red-black scheme or the multigrid 
scheme considering two frequencies at a time, u) y and 
u> y + n/hy. Thus the original error distribution 


( e i )° exp (iui y y) 
(4)° exp (i(w„ + f-)y) 


(34) 


becomes 


V = 


P { 3 cos(6» y /2) + cos(36» y /2))/4 
P{ 3sin(6» y /2) - sin(36» y /2))/4 


(40) 


The restriction and prolongation matrices in x are 
defined in Appendix IV. An alternate formulation of 
the amplification matrix is defined as 


G* = TV GV* (41) 


where R* and P* are higher order restriction and 
prolongation operators, also defined in Appendix IV, 
and the matrices TV and V* are constructed assum- 
ing no aliasing errors occur in the transfer of infor- 
mation in the periodic direction y. 

R* 0];V*= P * 

In this formulation, even for the single-grid scheme, 
the amplification matrix accounts for the transfer 
of information between the coarser and finer mesh. 
The equation represents the prolongation of fine grid 
low-frequency errors on the coarser mesh to the fine 
mesh, their amplification on the fine mesh, and a 


TV = 
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subsequent restriction down to the coarser mesh. 
The alternate formulation reduces the size of the ma- 
trix eigenvalue problem to be solved from 2 N x x 2 N x 
to N x /2xN x /2 and generally yields results which are 
within 1 cycle of the standard formulation. Larger 
differences arose when the maximum amplifications 
occured near the O y /ir = 0.5 boundary; these differ- 
ences have not been resolved, but in those instances 
where differences do occur, the standard amplifica- 
tion matrix is used. 

The relaxation matrix S accounts for l = N s subit- 
erations as 


S = S'$ + S [ ' ] (42) 


S' = S [ '- 1] +S [ '- 2] + ... + S [1] + I (43) 


<h=(L^)- 1 (L^-Lf) (44) 


H = I-L2 _1 L3 (45) 

where the superscript [n] on a term denotes the term 
raised to the nth power when there might be con- 
fusion with the n or l superscript. The matrix cor- 
responding to amplification of the discrete residual 
can be written as 


H = L t A G(L t A )“ 1 (46) 


on the L 2 norm of either the error or the residual 
and correlate the number of cycles to reduce them 
by specified amounts. 

Stopping Criteria 

By far, the most important bound is that for the 
error, because we need only converge the numerical 
scheme to within some measure of truncation error 
on each grid. If we reduce the algebraic error on each 
mesh by a factor /, the total error on subsequent 
meshes can be expressed as u h = « e xact + T h (1 + /?) 


? = ,i + ' 


, 2 P , 


(47) 


If we somewhat arbitrarily invoke that [3 = 1/8, so 
that the algebraic error is 1 /8th of the truncation 
error, then the error reduction values / are 10 and 
28 for the p = 1 (first-order) and p = 2 (second- 
order) schemes, respectively. We apply this criteria 
for the local error to the L 2 error norms. In the 
development of Eq. (47), we assume that the pro- 
longation operator used in the interpolation of the 
solution from the coarser mesh is nearly unity, con- 
sistent with a high-order interpolation. If we com- 
bine that with a high-order restriction operator, we 
can enforce that a given fine-grid algebraic error as 
interpolated to the coarse mesh be reduced by the 
factor / through examination of the G* amplifica- 
tion matrix given above. In practice, the residual 
is usually monitored, and grid-independent conver- 
gence rates for the residual have been used as an 
assessment of whether the multigrid is functioning 
properly. Hence, we show some results for reduction 
of the residual norms. 


Error Bounds 

The usual bounds considered for amplification of 
errors are the spectral norm and the L 2 norm. The 
spectral norms of G and H are the same because 
they are related by a similarity transformation, al- 
though we find in practice that the computation of 
P( H) is subject to round-off errors. For elliptic equa- 
tions, Brandt 23 has shown that these bounds are at- 
tained for general domains using local mode (fully 
periodic) analyses as long as the cycle is supple- 
mented with additional (and negligibly small) pro- 
cessing at and near the boundary. For hyperbolic 
equations, the boundary has a more global influ- 
ence; in the case of convection, information prop- 
agates from the boundary into the domain. For con- 
vective equations, we find the spectral norm to be 
not very useful, since it often is reached only after a 
large number of iterations and may not be observed 
in practice (see Appendix V). Thus we concentrate 


Efficiency Measure 

In all cases, we show the number of updates of the 
solution on the fine grid, defined as 

Nu = N c N s {v 1 + v 2 ) 

as the efficiency measure. This is only approximate, 
because it only accounts directly for the line inver- 
sions done through either subiterations or additional 
multigrid relaxations. The residual evaluations and 
any work done on coarser grid are not taken into ac- 
count. We can also define an effective norm as the 
norm per update of the solution, i.e. , 

(||G[ Ar J|| 2 ) e//ect „ e = (||G[ iV J|| 2 ) 1 / iv - (48) 

With the above models, there are a large number 
of possible parametric studies of the type that have 
been conducted extensively for elliptic equations; the 
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general approach is extendable to systems of equa- 
tions, three dimensions, and semi-coarsening multi- 
grid schemes. We explore only a few parameters 
below, focusing on the effect of convergence for the 
second-order-accurate discretizations, including the 
effects of relaxation scheme, subiterations, and FAS 
multigrid on the convergence. 

Convergence Results 
Unfactored Scheme 

The number of updates for convergence with the 
unfactored scheme is considered an upper bound for 
performance with defect correction. The direct so- 
lution of the associated large-bandwidth equation is 
not viable from an efficiency standpoint, but with 
the first-order implicit equations, convergence can 
be attained in only a few multigrid cycles. 7-9 For 
the unfactored scheme, we show results with n t = 0 
and Kd = —3 for a typical case, a = 30 deg, both 
with and without multigrid. We examine the num- 
ber of cycles to reach convergence of 1 1 G* 1 1 2 below 
1/28, with 9 y over all possible discrete frequencies. 
The results are shown in Table 1. There is a growth 
in the number of cycles in either case because of 
the defect correction approximation, i.e. , the dispar- 
ity between the target second-order and the driver 
first-order scheme. The growth is, however, not ex- 
plosive. The multigrid scheme is effective, even for 
this unfactored implicit scheme, in reducing both the 
error and the residual norms in comparison to the 
single-grid scheme. The improvement approaches a 
factor of two as the grid is refined. On the finest 
mesh, the number of cycles to convergence for the 
lowest frequency considered and its associated har- 
monic, Oy/n = (—0.0156,0.9844), was five for the 
single-grid scheme and four for the multigrid scheme; 
the slowest convergence occurred at frequencies of 
Oy/n = (-0.25,0.75) and 6 y / n = (-0.343,0.657), 
respectively, for the two schemes. The average num- 
ber of cycles over all the frequencies considered 
was 15.6 and 8.8 for the single-grid and multigrid 
schemes, respectively. Results obtained by moni- 
toring convergence of 1 1 H 1 1 2 below 10 -4 showed a 
similar growth in the number of cycles for both sin- 
gle and multigrid schemes, although the multigrid 
showed only a 20-percent improvement in the num- 
ber of cycles to attain this level of residual conver- 
gence on the finest mesh. 

The above comparison represents a worst case sce- 
nario for the bound, because all frequencies need to 
be reduced by a constant amount. In fact, through 
the FMG process, the troublesome frequencies may 
have very small amplitudes in the starting solutions 
for a given grid. Additionally, those frequencies 


N x 

Single Grid 

Multigrid 

8 

7 

7 

16 

9 

8 

32 

12 

10 

64 

18 

12 

128 

25 

13 


Table 1. Number of updates, Nu, to reduce error 
norm 1 1 G* 1 1 2 < 1/28 for the unfactored DC scheme; 
\6 y \ < 2-7T, K t = 0, Kd = —3, a = 30 deg 


N x 

Single Grid 

Multigrid 

M 

M 

8 

3 

4 

36 

7 

16 

8 

8 

18 

2 

32 

12 

7 

9 

0.5 

64 

10 

4 

4 

0.1 

128 

6 

4 

2 

0.03 

256 

5 

4 

1 

0.007 


Table 2. Number of updates, Nu, to reduce error 
norm 1 1 G* 1 1 2 < 1/28 for the unfactored DC scheme; 
uiy = 871 -, Kt = 0 , Kd = —3, a = 30 deg 

which are unresolved need only be reduced by less 
than 1/28 to be within truncation error. Thus, in 
Table 2, we show the number of updates considering 
only a single frequency, corresponding to u> y = 87T. 
We also show r values from Eq. (18-19) in the table; 
the N x = N y = 32 grid is the first grid that would 
show close to the desired order property of a factor 
of four reduction in the error as the grid is refined 
by a factor of two in the x and y directions. The 
first order scheme does not provide resolution over 
the entire domain except on the very finest mesh; 
note the span of grids between accuracy of the first 
order and second order operators. The asymptotic 
convergence is quite good, as predicted by the de- 
fect correction asymptotic analysis, because eventu- 
ally the solution is resolved even with the first-order 
driver scheme. In this limit, the multigrid is nei- 
ther effective nor needed, as the single grid converges 
the error and residual in a few cycles. The bound- 
ary between regions II and III exhibits the slowest 
behavior; the multigrid method improves the con- 
vergence because the truncation error of the coarser 
mesh target operator is better than the truncation 
error of the fine mesh driver operator. Although not 
shown, results with the K t = — 1 scheme were similar 
to those above, although the number of cycles were 
30-50 percent lower. 

Comparisons with the AF Scheme 

For the noniterative (N s = 1) AF scheme, we first 
show results with Kd = K t = —3 for a = 45 deg, 
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Method 

N x 

IIGII2 

l|H || 2 

p( G) 

SG 

8 

.91 

.91 

.5 

SG 

16 

.97 

.97 

.5 

SG 

32 

.99 

.99 

.5 

SG 

64 

.99 

.99 

.5 

MG 

8 

.67 

.64 

.57 

MG 

16 

.82 

.73 

.57 

MG 

32 

.98 

.86 

.58 

MG 

64 

1.13 

1.01 

.58 


Table 3. Norms after one update for the AF scheme; 
1 0 y | < 27 r, Kt = Kd = — 3, a = 30 deg 


N x 

SG 

Pred. 

SG 

CFL3D 

MG 

Pred. 

MG 

CFL3D 

8 

30 

28 

18 

16 

16 

44 

40 

18 

17 

32 

68 

61 

19 

17 

64 

112 

101 

19 

17 

128 

- 

175 

- 

17 


Table 4. Number of updates, Nu, to reduce 1 1 H 1 1 2 
by 10 -4 for the AF single-grid (SG) and multigrid 
(MG) schemes; \9 y | < 27T, K t = Kd = — 3, a = 45 deg 


Method 

N x 

l|G || 2 

l|H || 2 

P( G) 

SG 

8 

.96 

.95 

.79 

SG 

16 

.99 

.99 

.79 

SG 

32 

« 1 

« 1 

.80 

SG 

64 

« 1 

« 1 

.80 

MG 

8 

1.15 

1.02 

.79 

MG 

16 

1.76 

1.74 

.81 

MG 

32 

2.65 

2.77 

.81 

MG 

64 

4.05 

4.01 

.82 


Table 5. Norms after one update for the AF scheme; 
1 0 y | < 27t, K t = 0, Kd = —3, a = 30 deg 


N x 

SG 

Pred. 

SG 

CFL3D 

MG 

Pred. 

MG 

CFL3D 

8 

47 

37(29) 

40 

37(26) 

16 

64 

51(45) 

44 

41(30) 

32 

91 

70(67) 

50 

43(39) 

64 

139 

106(104) 

64 

51(49) 

128 

- 

170(165) 

- 

65(67) 


Table6. Number of updates, Nu , to reduce 1 1 H 1 1 2 by 
10 -4 for the AF single grid (SG) and multigrid (MG) 
schemes; \9 y | < 27T, K t = 0 , kj = —3, a = 30 deg 


both with and without multigrid. The CFL number 
is selected as 1 + tan a because then p( G) = 0.5, in- 
dependent of the mesh. The norms shown in Table 3 
after one cycle (N c = 1) indicate a spectral norm for 
multigrid that is actually slightly higher than that 
of the single grid. The L 2 norms are greater than 
unity for the multigrid scheme; the single-grid norms 
are all below unity. The number of updates to reach 
convergence of 1 1 H 1 1 2 below 10 -4 , with 9 y over all 
possible discrete frequencies, is shown in Table 4; 
several convergence calculations were not performed 
for the highest grid density and are denoted with 
a — in the Tables. Also shown are the results on 
a square domain from the baseline CFL3D code 14 
with freestream values imposed at the boundaries 
x = 0 and x = 1 and with periodicity imposed in 
the y direction. A random perturbation was im- 
posed in the interior to the density held only, so that 
the full system of equations emulates the scalar con- 
vection equation analyzed here. The Mach number 
was 0.5 for the CFL3D computations but because 
the residual equations recognize a contact discon- 
tinuity exactly, the results are independent of the 
Mach number; the only modification required for 
correspondence was to the time step, because the 
system of equations bases the time step on the maxi- 
mum eigenvalue of the full system. For the multigrid 
computation in either case, the usual W(1,0) cycle 
is used. 

The first-order comparisons show that even 
though the asymptotic rate is lower with the single- 
grid scheme, the effective convergence is much bet- 
ter for the multigrid scheme. The single-grid scheme 
shows a clear dependence on the mesh size and the 
multigrid rate is nearly grid-independent in both 
the analysis and the computation. The predicted 
bounds for the number of cycles required correlate 
well with the CFL3D results. 

Some calculations were made using a residual 
tolerance of 13 orders of magnitude — an extreme 
value — to illustrate the slow convergence with it- 
eration of the effective error norm to its asymp- 
totic value given by the spectral radius. With the 
Kd = K t = —3 scheme, the residual reduction be- 
tween the last two cycles for the single-grid scheme 
was 0.55, 0.59, 0.63, 0.68 for N x = N y = 8, 16, 32, 64, 
respectively, as compared to the spectral norm of 0.5. 

Next we consider the convergence of the second- 
order scheme K t = 0 with the first-order implicit AF 
scheme (Kd = —3) for a = 30 deg, both with and 
without multigrid. The CFL number is again se- 
lected as 1 + tana. The initial norms are shown in 
Table 5 and indicate a spectral norm for multigrid 
that is again slightly higher than that of the sin- 
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N x 

AF 

ALJ 

ALRB 

8 

5 

3 

2 

16 

5 

4 

3 

32 

5 

4 

3 

64 

5 

4 

3 


N x 

AF 

ALJ 

ALRB 

Unfactored 

8 

15 

11 

9 

7 

16 

19 

16 

10 

8 

32 

25 

21 

13 

10 

64 

31 

39 

21 

12 


Table 7. Number of updates, Nu, to reduce 1 1 G* 1 1 2 
by 1/10 for multigrid (MG) with various relaxation 
schemes; \9 y \ < 2n, k = —3, a = 30 deg 

gle grid. The predictions of the analysis with corre- 
sponding results from CFL3D are shown in Table 6. 
Two sets of computations are shown; the results in 
parentheses are the results with freestream condi- 
tions imposed along all boundaries. The single-grid 
analyses and computations show a clear dependence 
on the mesh size; as we expect, the corresponding 
multigrid performance is not quite grid-independent, 
because even the unfactored scheme shows such a 
dependence. For both of these k values, the pre- 
dicted bounds for the number of cycles required cor- 
relate well with the numerical results. The number 
of cycles without periodicity are generally lower than 
with periodicity imposed, but approaches the same 
number of cycles as the grid is refined. Although 
not shown here, similar results were obtained with 
the K t = — 1 formulation; the only differences were 
that the number of cycles was approximately 20 per- 
cent lower with this scheme, as might be expected 
because the dissipation levels are higher. 

First-Order Multigrid Scheme 

As a model for the convergence of the linear im- 
plicit matrix equation, we consider the multigrid 
scheme with n t = Kd = —3 and with various relax- 
ation schemes. Table 7 shows results corresponding 
to a V(1,0) FAS multigrid cycle. Because this partic- 
ular model is a linear equation with consistent target 
and driver schemes, the FAS cycle is equivalent to 
a CS cycle. It is clear that all of the schemes con- 
verge rapidly and show little variation in the num- 
ber of iterations to reach convergence as the mesh 
is refined. Although not shown here, single-grid cal- 
culations showed a clear doubling of the number of 
iterations on each successive mesh refinement. 

Second-Order Multigrid Scheme 

The convergence for the multigrid scheme with no 
subiterations is shown in Table 8; the parameters 
are the same as those for the results in Table 7 ex- 
cept that Kt = 0. There is some dependence of the 
number of cycles to reach convergence, as expected, 
because even if we eliminate the factorization errors, 
there is dependence on the mesh density (Table 1). 


Table 8. Number of updates, Nu, to reduce 1 1 G 1 1 2 
by 1/28 for defect-correction multigrid (MG) with 
various relaxation schemes; \9 y \ < 27T, K t = 0 ,Kd = 
—3, a = 30 deg 



N x 

N s = 1 

N s = 2 

II 

CO 

N s = 5 

SG 

8 

12 

18 

24 

40 

SG 

16 

23 

24 

30 

50 

SG 

32 

30 

32 

42 

65 

SG 

64 

57 

50 

63 

95 

MG 

8 

11 

16 

21 

35 

MG 

16 

16 

16 

24 

40 

MG 

32 

21 

24 

30 

50 

MG 

64 

39 

36 

42 

65 


Table 9. Number of updates, Nu, to reduce 1 1 G 1 1 2 
by 1/28 for the ALJ single-grid (SG) and multigrid 
(MG) schemes; \9 y \ < 27 T,K t = 0 , /c<j = —3, a = 
30 deg 

The best performance is attained with the ALRB 
scheme; it degrades little from the unfactored multi- 
grid scheme on coarser meshes. As the mesh is re- 
fined, the AF scheme is more competitive because 
its performance degrades less as the mesh is refined; 
on the N x = 64 mesh, it is actually more efficient 
than the ALJ scheme. 

Effect of Subiterations. (N s > 1) 

We consider the effect of subiterations for the 
defect-correction scheme corresponding to K t = 0 for 
a = 30 deg; Tables 9-10 show results for both the 
single-grid and the multigrid schemes with (iq = 
1 ,i >2 = 0) and with the ALJ scheme. Regarding 
the error reduction, there is some benefit of a few 
subiterations, especially as the mesh is refined. Too 
many subiterations are clearly not efficient when the 
total number of updates, as used here, is considered. 
These results are in qualitative agreement with prac- 
tical calculations for the full systems of equations 
because N s = 3 provided good performance for the 
results shown subsequently. In those situations, us- 
ing 2 subiterations was optimal when stability was 
maintained but was not robust for large time steps. 
As shown in Table 10, for the residual reduction no 
benefit appears at all; the fastest residual reduction 
occurs with N s = 1, and the disparity with addi- 
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tional subiterations grows as the mesh is refined. 
These results are in qualitative agreement with the 
single-grid local mode analysis of MacCormack and 
Pulliam 19 for the full system of equations using the 
defect correction scheme with n t = 1/3; a few subit- 
erations drove the spectral norms below unity. 

Efficiency Comparisons 

All of the relaxation/smoothers considered here 
are amenable to vectorization and parallel imple- 
mentation on computers because they are either Ja- 
cobi, red-black, or factored schemes. 

The operation counts for the ALJ and ALRB 
methods are only slightly increased over current 
block approximate-factorization methods. The over- 
head is the requirement for additional iterations of 
the linear system before updating the nonlinear sys- 
tem. However, because the measured computer time 
for the diagonalized line inversions of the AF scheme 
for two-dimensional simulations is a factor of four 
less than the corresponding block inversions, the 
number of updates of the ALJ and ALRB schemes 
must show an appreciable improvement over the AF 
scheme to be viable. In general, this is not attained 
for the isotropic cases considered here. Substantial 
differences would be expected in cases of high grid 
anisotropies, because the AF scheme is known not 
to be optimal in its present form, as studied exten- 
sively by Buelow, et al. 24 The operation counts for 
the block inversions can be reduced substantially by 
saving the implicit Jacobians and LU decomposition 
of the line inversions; the computational results in- 
dicate the method is not sensitive to updating these 
Jacobian entries. In the current implementation, for 
example, the Jacobians and the LU decomposition 
of the implicit line solutions are computed initially 
and reused for each of the updates at a given mesh 
resolution. For a V(2,l) FAS cycle with three subit- 
erations, the block LU decompositons are done once 
instead of six times. This saves about a factor of two 
in computer time and makes the ALJ and ALRB 
schemes competitive with the diagonalized AF for 
isotropic cases, albeit at the cost of increased stor- 
age. Greater re-use could be made at the cost of 
additional storage. 

Large-Scale Computations 

Summarized below are the large-scale computa- 
tions that have been made for several inviscid and 
viscous flows, including viscous flow over a flat plate 
and the separated flow over an airfoil. 



N x 

N s = 1 

N s = 3 

N s = 5 

SG 

8 

26 

54 

90 

SG 

16 

49 

66 

105 

SG 

32 

72 

81 

130 

SG 

64 

82 

108 

170 

MG 

8 

34 

60 

90 

MG 

16 

60 

66 

100 

MG 

32 

68 

72 

110 

MG 

64 

70 

90 

140 


Table 10. Number of updates, Nu , to reduce 1 1 H 1 1 2 
by 10 -4 for the ALJ single-grid (SG) scheme; \6 y \ < 
2-7T, K t = 0 , Kd = —3, a = 30 deg 

Bump in a Channel 

Extensive calculations were made for the inviscid 
flow over a 10-percent thick profile with a sin 2 profile 
in a channel at a Mach number of 0.5. A prediction 
methodology based on two-grid local mode analysis 
for a system of equations, similar to that of Mulder, 9 
showed that an asymptotic rate of 0.5 per V(1,0) 
FAS cycle could be attained with the ALJ scheme 
with either subiterations or CS multigrid applied to 
the linear system. Numerical calculations confirmed 
this; systematic variations of the number of subit- 
erations required indicated that three subiterations 
without the CS were sufficient to allow large time 
steps on the order of Courant numbers of 100-300. 
Calculations made with the CS multigrid applied to 
the linear system allowed larger time steps, but pro- 
vided no overall advantage in convergence. Good 
results were obtained with a V(2,l) FAS cycle with 
three subiterations of the linear system; the conver- 
gence rate corresponded to 0.5 1 / 3 = 0.8 per fine-grid 
update. Convergence to within 5 percent of trunca- 
tion error, as measured by the computed drag, oc- 
curred within two cycles (or equivalently, 9 fine-grid 
updates). 

Ms. Carolyn Dear of Mississippi State University 
provided some extensive benchmark evaluations of 
the baseline AF solver during an intern period at 
NASA Langley during the summer of 1998. The 
results showed that the baseline solver in CFL3D 
provided grid-independent convergence rates using 
flux- vector splitting as the Riemann solver, but that 
with flux-difference splitting, the results were grid 
dependent. The remedy was to apply an entropy 
fix to the steady residual equations so the minimum 
eigenvalue of the flux Jacobian matrices did not fall 
below 0.05 of the maximum value.* Interestingly, 

*This is the only modification made to the steady residual 
operator of the baseline solver CFL3D over the course of this 
work. 
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the ALJ formulation did not require this modifica- 
tion to obtain grid-independent convergence. This 
behavior is attributed to the system of equations and 
could be studied by extending the methodology here 
to systems. In comparisons of the two approaches 
that used the FMG approach with this entropy fix 
applied, the V(2,l) FAS cycle with 3 subiterations of 
ALJ as the relaxation scheme was competitive with 
but did not surpass the baseline solver. 

NACA 0012 Airfoil 

A series of airfoil calculations indicated results 
similar to the inviscid simulations above. Again con- 
vergence of lift and drag was obtained in a few cycles 
of the ALJ V(2,l) FAS scheme with 3 subiterations. 
However, the baseline solver was already adequate 
to provide efficient multigrid solutions. 

For calculations from impulsive freestream initia- 
tions, instead of through an FMG process, the use of 
large time steps presented some difficulties; to over- 
come these difficulties, the calculation was started 
at moderate Courant numbers and ramped to large 
time steps over a few cycles. For the impulsive start, 
there was a decided improvement over the baseline 
scheme of the ALRB (or ALJ) subiteration multigrid 
scheme. 

Flat Plate Boundary Layer 

The simulation of viscous flow over a flat plate 
was done for a range of Reynolds numbers with 
a computational domain extending from x = —1 
to x = 2, with no-slip conditions imposed start- 
ing at x = 0.5. Constant-pressure boundary con- 
ditions were used downstream, along with speci- 
fied total-pressure, entropy, and velocity-direction 
boundary conditions at inflow. The aforementioned 
two-grid local mode analysis indicated that, if the 
linear system was solved either through subiteration 
or CS multigrid, the same convergence rate of 0.5 
per V(1,0) cycle could be attained. Computations 
confirmed the analysis; the calculation was insensi- 
tive to the grid stretching, and the convergence rate 
of the residual was better than 0.55 per V(1,0) cycle. 

As the grid was stretched, however, it became 
more difficult to start the solution. Large time steps 
could be taken, but the increase from small values 
had to occur slowly. This difficulty was remedied by 
applying an entropy Lx to the implicit side of the 
equations; the minimum eigenvalue was constrained 
to be on the order of 0.1 of the maximum eigenvalue. 
With this modiRcation, large time steps could be 
taken from impulsive starts. When the CS multi- 
grid was used for the implicit system, the conver- 
gence rate of the linear system was better than 0.2 
per W(2,l) CS cycle; thus one W(2,l) CS cycle was 


quite sufficient to solve the linear system to a toler- 
ance of approximately one order of magnitude. The 
same overall efficiency of the nonlinear residual con- 
vergence could be attained if only 3 subiterations, 
instead of the CS multigrid, were used for the linear 
system. 

In comparisons with the baseline scheme, a cal- 
culation was made on a very highly stretched mesh 
for laminar Row starting from freestream values. To 
attain convergence of the integrated drag coefficient, 
the V(2,l) FAS cycle with 3 subiterations of either 
the ALJ or the ALRB scheme showed a factor of ten 
reduction in computer time over the baseline scheme 
The local skin friction values on the plate converged 
in just a few cycles with this scheme, consistent with 
the findings of Koren 12 using a CS multigrid defect 
correction scheme. With an FMG cycle, the im- 
provement was less — approximately a factor of five. 
Calculations were made using an algebraic turbu- 
lence model that showed a similar improvement over 
the baseline scheme. 

Airfoil with Laminar Separation 

The convergence efficiency for the flow over a 
NACA 0012 airfoil at a Mach number of 0.8, a = 
10 deg, and a Reynolds number of 500 was investi- 
gated for a series of meshes. For this simulation, the 
region of separation extends over most of the air- 
foil upper surface, from x/c = 0.35 to x/c = 0.97. 
Generally, the lift and drag converged in only a few 
V(2,l) FAS cycles on meshes varying from 65 x 25 
to 641 x 129. The grid was a C-type mesh; in order 
to obtain grid-independent convergence, it was nec- 
essary to construct the implicit lines in the wake so 
that they spanned the wake. The asymptotic conver- 
gence was virtually constant on all meshes; reduction 
of the residual ten orders of magnitude was attained 
in 40 V(2,l) FAS cycles, corresponding to 120 fine- 
grid updates using 3 subiterations of the ALRB or 
the ALJ scheme for each relaxation. 

Conclusions 

A hierarchical multigrid algorithm for efficient 
steady solutions to the two-dimensional compress- 
ible Navier-Stokes equations has been developed and 
demonstrated. The general algorithm applies FAS 
multigrid to a nonlinear target residual equation and 
CS multigrid to a linearized defect correction im- 
plicit equation. The computational work scales as 
the total number of unknowns in the simulation, N , 
times the square of the number of equations at each 
grid point, m, because solutions to block-tridiagonal 
matrices of block size m are needed. Multigrid anal- 
yses that include the effect of boundary conditions 
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in one direction are used to estimate the convergence 
rate of the algorithm for a model convection equa- 
tion. Three alternating-line-implicit algorithms are 
compared in terms of efficiency. The analyses in- 
dicate that full multigrid efficiency is not attained 
in the general case; the number of cycles to at- 
tain convergence is dependent on the mesh for high- 
frequency cross-stream variations. Of the three algo- 
rithms investigated, the baseline AF solver provided 
the overall best for isotropic grids, considering that 
its cost per update with the diagonal version is a 
factor of four cheaper than the full block inversions 
associated with the ALJ and ALRB schemes. Nu- 
merical simulations for a series of flows indicated 
the ALJ and ALRB were only competitive with the 
baseline scheme for inviscid flows but were clearly 
superior for highly stretched viscous mesh simula- 
tions. With a V(2,l) FAS cycle with 3 subiterations 
of either the ALJ or ALRB schemes, convergence of 
lift and drag to within truncation error occurred in 
two multigrid cycles. 
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Appendices 

I Implementation of ALJ Scheme 

The ALJ scheme is implemented with two sweeps 

through the mesh, as below for the scalar convection 

equation on mesh h. 

(— + S x + tSy )d(Ae*) = -L t (e) n (49) 


+ S? + tSy) d ( Ae) = -L t (e) n - (S x - S°) d ( Ae*) 

(50) 

where the superscript D denotes the diagonal con- 
tribution of the operator. Substituting from Eq(49) 
for (J a; ) ( j(Ae*) into Eq. (50), then 

(t 7 + $x + i dy)d(A.e) = (— + (if + t Sy ) d (Ae*) 


Now substituting from Eq. (49) again, the scheme 
can be written as an approximate factorization 
scheme, referred to as the DDADI scheme and used 
by MacCormack and Pulliam 19 



The scheme above is quite similar to the damped 
ALJ scheme proposed by Mulder. 9 In Mulder’s 
work, as in the relaxation methods used by Thomas, 
Walters, and Van Leer, 20 the solution and target 
residuals are updated after each sweep. In the con- 
text of the convection scheme studied here, the two 
sweeps correspond to 

(4 + 2t Sy)d(e n+1 - e n ) = -L t (e) n (51) 


(2 df + t 6y)d(e n+2 - e n+1 ) = -L t (e) n+1 (52) 

Even though no time step explicitly occurs, the two 
left sides of Eqs (51-52) are equivalent to the left 
sides of Eqs (49-50) if, in the latter, we use a CFL 
of (t + 1 )/t and t + 1 on the ^-implicit and y- 
implicit sweeps, respectively. Thus, for non-grid- 
aligned flows, the time steps are not much greater 
than those of the AF scheme. Within such a corre- 
spondence, if the target scheme is the same as the 
implicit scheme, as for example with subiterations 
or with a first-order-accurate scheme, these two ap- 
proaches would be identical. In the general case, 
the two schemes would behave differently because 
of the above differences in the way the first sweep 
influences the second sweep. 

II Implementation of ALRB Scheme 

The ALRB scheme is implemented very similarly 
to the above scheme, except that the implicit lines 
are updated in a red-black fashion, as below, assum- 
ing that K d = —3. 

= 

-L t (e) n : B x (53) 
= 

-L t (e) n -(tS y - t6°)(Ae*) : R x (54) 


14 

American Institute of Aeronautics and Astronautics 



AIAA-99-0785 


+ t 3y)d{Ae**) - 

-L t (e) n -(S X -S B ) d (Ae*) : B y (55) 

^ 4)4Ae**) = 

-L t (e) n -(^-<yf) d (Ae**) : R y (56) 

where ( R X ,B X ) and ( R y ,B y ) refer to a sequencing 
of the (x, y) lines in red-black fashion. Now Eqs (53) 
and (54) can be written as 


(N x ) d (\e*) = -L t (e) n (57) 

where N x denotes the lower triangular part of the 
full driver operator I /At + L d after a resequencing 
of the x = constant lines in a red-black fashion. Like- 
wise the second set of equations, by adding and sub- 
tracting the term L d ( Ae*) to the right-hand sides, 
can be written as 

(N y ) d (Ae** - Ae*) = -L t ( e) n - L d ( Ae*) (58) 

where N y denotes the lower triangular part of the 
full driver operator I /At + L d after a resequencing 
of the y = constant lines in a red-black fashion. A 
composite operator can then be written as 

Ae** = -{N// 1 + N// 1 - N// 1 L d N~ 1 )L t (e) n 
= -(T d )- 1 L t ( e ) n (59) 

III Amplification Matrix for ALRB 

The amplification matrix for the step correspond- 
ing to Eq. (53) can be written as 

G 1 = i C(G B1 , G m ) (60) 


where 


C(G, H) = G + H + 


G21 — H21 G22 — H22 

Gn-Hn G 12 - H 12 

(61) 


and where G fil = I, 

G B1 = I — M -1 L t (62) 


and M is a diagonal matrix corresponding to the x- 
implicit/j/- Jacobi approximation, i.e. , M = (1/ At + 
S x + tSy) d along the diagonal entries. The amplih- 
cation matrix for the step corresponding to Eq. (54) 
can be written as 


G 2 = ^C{G B2 ,G R2 ) (63) 

where G m = G 1 , 

G^ 2 = (I - M _1 L d )G 1 + M _1 (L d - L t ) (64) 

The amplihcation matrix after the completion of the 
j/-implicit/*-RB sweep corresponding to Eqs. (55) 
and (56) can be written as 


G = (I - N" 1 L d )G 2 + N“ 1 (L ( j - L t ) (65) 

where N y is the matrix corresponding to a y- 
implicit/*-RB approximation to L d . There is some 
dependence of the iteration on the order of the points 
taken; here, the first sweep is for the odd points, 
which includes the point closest to the inflow bound- 
ary. 

IV Restrictions and Prolongations 

The restriction and prolongation matrices R and 
P associated with the ^-direction coarsening are of 
dimensions N x / 2 x N x and N x x N x /2, respectively, 
as below. 


n 

n 

r’i 

0 

0 

0 

0 

0 

0 

r 2 

r 1 

r 1 

r 2 

0 

0 

0 

0 

0 

0 

r 2 

r 1 

r 1 

r 2 

0 

0 

0 

0 

0 

0 

r 2 

r 1 

r 1 


Pi 

Po 

0 

0 

0 

0 

0 

0 

Pi 

P2 

P3 

0 

0 

0 

0 

0 

P2 

Pi 

Po 

0 

0 

0 

0 

0 

Po 

Pi 

P2 

P3 

0 

0 

0 

0 

P3 

P2 

Pi 

Po 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Po 

Pi 

P2 

0 

0 

0 

0 

0 

P3 

P2 

Pi 

0 

0 

0 

0 

0 

0 

Po 

Pi 


where for the usual prolongations and restrictions 
{P0,Pl,P2,P3} = ^{0,3, 1,0} 
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{n,r 2 } = ^{ 1 , 0 } 

and for the higher-order prolongations and restric- 
tions, denoted with asterisks, 

{Po,Pi,P2,P3} = ^-{-21,315, —5} 

{ri,r 2 } = ^{4,-1} 

6 

V A Curious Case of Norms 

As an illustration of the difference between the L 2 
and spectral norms, the convergence for grid-aligned 
flow, t = 0, is considered. With the k = —1 differ- 
encing scheme, Desideri and Hemker 25 have shown 
that the predicted asymptotic error decay rate is 
very fast, p( G) < 0.5, but that the asymptotic rate is 
achieved only after 2 N x cycles. This odd behavior is 
associated with a deficient set of eigenvectors of the 
iteration matrix; the initial slow decay of the resid- 
ual over the first 2N X iterations is termed a pseudo- 
convection phase by Desideri and Hemker. 25 The L 2 
norms of G and H remain close to unity for 2 N x cy- 
cles and then fast convergence is attained. Because 
the left and right side matrices are lower triangular, 
it is straightforward to show that the asymptotic 
spectral radius is determined by the ratio of the di- 
agonal terms between the driver and target schemes, 
i.e., 


j/'r-'i - |i | 

)_ 

By scaling the left side implicit matrix by 3/2, equiv- 
alent to an underrelaxation 26 of the right side resid- 
ual by 2/3, the spectral norm becomes zero and the 
L 2 norms of both G and H are 1/3. Now the conver- 
gence is quite fast — so fast, in fact, that the asymp- 
totic rate of zero, which is indeed achieved after N x 
cycles, is usually not seen before machine zero is 
reached. 
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